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Abstract 

Primary vertex finding for collider experiments is studied. The efficiency and precision of finding interaction 
vertices can be improved by advanced clustering and classification methods, such as agglomerative clustering with 
fast pairwise nearest neighbor search, followed by Gaussian mixture model or k-means clustering. 
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1. Introduction 

In this paper one-dimensional vertex finding for minimum bias proton-proton collisions is studied, with a special 
care for detecting all inelastic interactions, even in high pile-up. For the procedure clean primary particles are needed 
with good estimate of their starting longitudinal coordinate z, at the point of closest approach to beam line, as well as 
its standard variation cr z . The selection of particles is often based on the value of the impact parameter d of the track 
and its estimated standard deviation <xj, e.g. requiring d < 3cr rf (beam spot constraint). 

Finding the vertex or vertices of inelastic interactions in a collider is essential for physics analyses. The aim is to 
identify primary vertices efficiently with low background. Several methods and algorithms have been developed in 
the past. 

One of them [1] works well for single, high multiplicity events (e.g. heavy-ion collisions). It first estimates the 
vertex position by the centroid of z coordinates of hits left in the detector. It makes use of the correlation between the 
hits in separate layers and estimates the location of the vertex. 

Another one [2] takes all reconstructed tracks which are compatible with the beam spot. The mode of their z 
coordinates is used as vertex candidate. Tracks not compatible with the vertex are discarded, the rest is used for an 
adaptive multi-vertex fit. The remaining tracks are used to find a new vertex candidate. The procedure continues until 
a minimum of two tracks compatible with a common vertex is reached. The method has very high efficiency to find 
the correct vertex in a sample of it events. 

This article is organized as follows: Sec.|2]introduces a standard method of primary vertex finding, defines some 
performance measures and discusses its optimization. Sec.[3]shows the possible application of hierarchical clustering, 
while Sec.|4]deals with more sophisticated methods, such as Gaussian mixture models and k-means clustering. The 
details of the Monte Carlo simulation are given in Sec. [5] The comparisons of performance and timing are shown in 
Sec. [6] This work ends with conclusions. 

2. The standard method 

In a third method ^ primary vertex candidates are obtained by clustering selected tracks according to their z 
coordinate. The so called divisive method looks for large intervals without tracks and divides the z axis into several 
regions. For each region the vertex position is computed with a weighted average of all compatible tracks. Tracks not 
compatible with that average vertex position are discarded, but they are recovered to make a new vertex candidate. 
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Figure 1: Optimization of the standard method for n mm = 2 for several K s j m values. The contour lines of X 2 are drawn, the place (+) and value of 
the minimum are indicated. 



First, the tracks are ordered according to increasing z value. The ordered list is scanned to form a cluster until at 
least n m i„ consecutive tracks separated by more than z sep are found, at which point another cluster is built. The position 
of each of these primary vertex clusters is determined iteratively. A cleaning procedure is applied, rejecting the tracks 
farther from the estimated primary vertex position Zk than n a standard deviations (\z - Zk\ < ■ cr z ). The positions of 
the primary vertices are recomputed with the remaining tracks. The procedure iterated until each remaining track is 
compatible with its associated cluster according to the above criterion. 

For specific signal events the settings n a - 5, z sep — 50 - 500 /mi and « m! „ = 2 appeared to be most successful. The 
efficiency for reconstructing and correctly tagging the primary vertex of the signal event depends on the number of 
charged tracks produced at the vertex, and for low luminosity it ranges from 76% to to 99% depending on the signal 
process studied. 

In the following comparisons the above mentioned method will be used as standard vertex finder since it is best 
suited for finding multiple vertices opposed to others (Sec.[TJ which search only for the signal vertex. 

2.1. Measures of performance 

A vertex is associated to another vertex if more than half of its tracks are shared. This way a reconstructed 
(simulated) vertex can be associated to a simulated (reconstructed) vertex: the number of possible associations is zero 
or one. A simulated vertex is reconstructed n times if there are n reconstructed vertices which are associated to it. The 
efficiency gives the fraction of simulated vertices which are at least once reconstructed. The lost vertex rate shows 
the fraction of simulated vertices which are not reconstructed. The split vertex rate shows the fraction of simulated 
vertices which are more than once reconstructed. A reconstructed vertex is a fake if it has no associated simulated 
vertices. A reconstructed vertex is merged if it has more than one associated simulated vertices. These quantities can 
be studied as a function of fixed or Poisson distributed event-by-event pile-up. 

2.2. Optimization of the standard method 

The standard algorithm can be optimized by choosing /?„„■„, n a and z S ep sucn that the performance of vertex finding 
is best. As it was presented in Sec.|2]the roles of the parameters are 

• «„„■„: minimum number of tracks required to form a cluster. In order to have low bias, it is reasonable to look at 
settings n mi „ - 2 and 3. 

• ng-: a track is compatible with a vertex if it is closer than n a times the estimated standard deviation of z. 
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Figure 2: Optimization of the standard method for n m ,„ = 3 for several K s j,„ values. The contour lines of X 2 are drawn, the place (+) and value of 
the minimum are indicated. 



• Zsep- maximum distance between two adjacent tracks that belong to the same initial cluster. 

The measure of goodness, the merit function X 2 to minimize, can be chosen as the sum of the fractions of lost and 
split simulated vertices and that of the fake and merged reconstructed vertices: 

-^2 _ I K-losi + Kmulti Kfake + K mergei j \ 

The X 2 values for several settings, using 10 4 inelastic proton-proton events, were calculated on a grid: K S j m — 1, 
2, 4 and 8; n a - 2, 2.5, . . . , 5; z sep = 0.1, 0.15, . . . , 0.8 cm; w m ,„ = 2, 3. The obtained contour lines and the places 
of the minima are shown in Fig. [TJ(n m! > 1 =2) and Fig.[2](Hmm=3). The best values are also given in TableQ] Both the 
optimal n a and z sep depend on the real vertex multiplicity K sm . In case of a single simulated vertex a big n lT and z sep 
is needed for good reconstruction. It turns out that n m ,„=3 gives comparable or better performance for all K S j m values 
than those with « ml „=2. The setting 

n mi „ = 3, n a = 3.0, z sep = 0.3 cm 

is an acceptable compromise. Hence for the standard method these values were used in the following studies. 



3. Hierarchical clustering 

Since the track multiplicities of vertices greatly vary, for efficient classification (Sec. |4| the precise and unbiased 
designation of cluster centers is essential. The most widely used methods for hierarchical clustering are agglomerative 
methods yj. They start by connecting individual data points into small clusters, then connect those clusters, and so 
forth. The pairwise nearest neighbor method (PNN) using weighted averages proved to be a highly successful method 
for agglomerative clustering. In this work an advanced implementation, the fast pairwise nearest neighbor method 
(fPNN) |5j], was used where the list of closest neighbors is kept and updated. 

The distance d of two tracks i and j is defined as 

2 _ fa ~ Zj) 2 

a 2 + ( r 2 
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Table 1: Optimized parameters for the standard method. Best ;v, Zsep and minimal X 2 values for settings « m ,„ = 2 and 3 using 1, 2, 4 or 8 simulated 
vertices. 
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Figure 3: Optimization of the agglomerative clustering method for several simulated vertex multiplicities. The lines are drawn to guide the eye. 



Note that d is essentially the log-likelihood of the event that the two tracks belong to the same vertex. The clustering 
starts with all reconstructed tracks, each of them being a cluster with a single track. At each step the clusters with the 
smallest distance are found and the two clusters are joined: 

_ Zilot+ZjlQ 3 - 
Z ~ I/O* + I/O 2 .' 

The procedure is repeated until only some K clusters remain and the resulting vertex positions are passed to other 
classification methods for further refinement. More on the selection of optimal K value is given in Sec. [4] 

The K value can also be chosen, and the clustering can also be stopped, if at a step the smallest distance gets 
bigger than a given number d max . The merit value X 2 as function of d max for several simulated vertex multiplicities 
Ksim = 1, 2, 4 and 8 is shown in Fig. [3] The choice of d m i n « 8 appeared to give the best result. For performance plots 
of this method see Sec. |6] (labeled with fPNN). 

Another agglomerative clustering method, neighbor joining, was also tested but provided much poorer results and 
it was not studied further. 
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4. Classification with unsupervised learning 

4.1. Gaussian mixture model 

Gaussian mixture models are examples of classification by unsupervised learning where the solution is achieved 
by series of expectation and maximization steps [4]. We have N tracks with estimated longitudinal coordinates z n at 
their closest approach to the beam-line and their expected standard deviations cr„ (n = 1,2, ... ,N). If the number of 
interaction vertices K in a bunch crossing is given, the task is to find the means of the longitudinal coordinates ik and 
their weights, fraction of attached to all tracks, P(k) of each vertex (k — 1,2,..., K). 

The likelihood of finding the tracks at positions z„ is a product 

n 



or a sum 



^ 2 = -21ogX=-2^1ogP(z„) (2) 

n 

where P(z n ) is the so called the mixture weight for track n. It can be split into contributions from each vertex 

P{Z„) = Y J P ten,<T n $ k )P(k) 
k 



where the conditional probability is a Gaussian 



P(z„,o-„\zk) 



1 



cr„ V27r 

The probability that track n came from interaction vertex k is 



exp 



(Z n ~ Zk) 

lo-l 



Pnk 



P(z„,o-„\h)P(k) 

P(Zn) 



where p is also known as the responsibility matrix. In summary, in the expectation step p can be calculated for 
given values of means z.k and weights P(k) for all vertices. The procedure starts by using the results of agglomerative 
clustering for K vertices (Sec. [3]). During the maximization step, these means and weights are estimated from p as 



Zk = 



Z/7 PnkZn 
Zn Pnk 



P(k) = 



Zn Pnk 

N 



The expectation step followed by the minimization step will increase the likelihood value, or decrease^ 2 . Thus 
repeated iterations will converge to an extremum. In practice the process can be stopped if x 1 decreased only by 
a small amount or the number of iterations exceeded some limit. For performance plots of this method see Sec. [6] 
(labeled with fPNN+GaussM). 



4.2. The k-means clustering 

The k-means clustering is a simplification of the Gaussian mixture model 04J]. Tracks do not get assigned to 
clusters in a probabilistic way but they can be attached to one and only one of them. In the expectation step the tracks 
are assigned to the cluster k which has the closest mean p^. In the minimization step the means pu are re-estimated 
using the averaged z-coordinate of tracks assigned to cluster k. The process is stopped if the expectation step does 
not change the assignment of any track. The k-means clustering is simpler than the Gaussian mixture model, it is fast 
and converges rapidly, with somewhat reduced performance. For performance plots of this method see Sec.|6](labeled 
with fPNN+kMeans). 
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Figure 4: Left and center: Pseudo-rapidity and transverse momentum distributions of produced charged particles from inelastic proton-proton 
collisions at yfs = 10 TeV according to Pythia 6.4. The assumed acceptance window (|;;| < 2.5 and pj < 0.1 GeV/c) is indicated by the grey 
vertical lines. The event-by-event number distribution of accepted particles are shown right. The three curves correspond to contributions from 
single-, double diffractive and non-diffractive processes. 



4.3. Estimating the number of primary vertices 

Since the number of primary vertices K is not known in advance, it has to be determined from data. Starting with 
K — 1, the value of X 2 (K) as function of K can be examined. If adding a new vertex decreases^- 2 by some substantial 
amount, the addition can be regarded successful, otherwise the vertices found in the previous step should be retained. 

If there were indeed K real vertices in the bunch crossing, the expected value of x 2 (K) assuming distinct vertices, 
that is, non-overlapping clusters of tracks, is 

n " k 

where n^ are the number of tracks associated to the vertices as result of the optimization described in Sections [3] and 
[4] x 1 nas a shifted chi-squared distribution with degrees of freedom 

P(x 2 \K) = f(x 2 -A;N) 



where the shift is 




The task is to compare the merit values based on individual track probabilities (Eq. (f2]i) and the above calculated 
expected values (Eq. (O). If x 2 (K) is compatible with x 2 (K) with some confidence, K can be regarded as a good 
estimate of the number of interaction vertices. A sensible stopping condition is 

P(X 2 -A;N) > 10~ 3 . 

4.4. Use of priors 

If the shape of the interaction region or the interaction rate is unknown, we have no additional information, the 
priors are constant. Otherwise the z-distribution of vertices P{zk) and the number distribution of primary vertices 
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Figure 5 : The effect of multiple scattering. The first silicon layer is shown by the thick horizontal band (not on scale), the position of the primary 
vertex is indicated by V. For details see the text in Sec. 15. II 



P(K\K > 1) can be incorporated into the optimization. Their distributions are well described by a Gaussian and a 
Poissonian, respectively: 
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P(K\K > 1) = 



Kl 1 - e-v 



where zir and <tir are the mean and the standard deviation of the longitudinal coordinate of the interaction region, p 
is the average number of interaction vertices per bunch crossing. The contributions of the priors corresponding to the 
interaction region (IK) and the number of reconstructible inelastic interactions (int) to x 2 are 
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5. Simulation 

Inelastic proton-proton collisions were simulated using the Pythia event generator [6J at a center-of-mass energy 
of 10 TeV (including single-, double-diffractive and non-diffractive events). In order to take into account the limited 
acceptance of central detectors, only those primary charged particles were used for vertex finding which had \rj\ < 2.5 
and pr > 0. 1 GeV/c. The shape of the interaction region in z was approximated by a Gaussian with 5 cm standard 
deviation. The distribution of pseudo-rapidity, pj and event-by-event multiplicity are shown in Fig. |4] 

At least two compatible tracks are required to form an interaction vertex candidate (n* > 2) which eliminates most 
of the background coming from low pj spiralling particles looping in the strong magnetic field, decay products of 
weakly-decaying long lived resonances (Kg, A and A), y conversions, and particles from secondary interactions. 

5.1. Resolution of z position 

The resolution of z is governed by the local position resolution of the closest situated (pixel) silicon detectors and 
the effect of multiple Coulomb scattering. 

2 2 _2 

In this study <T pos = 50 pm is assumed. The expected standard deviation of z due to multiple scattering can be written 
in the form 

r r 13.6 MeV / x 

0~ms — , — #0 — , ~ + — : — ~ 

sin 2 6 sin 2 6 Ppc \ x o Sln 6 
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Figure 6: Efficiency of vertex finding for single events, as a function of track multiplicity, for several discussed vertex finding methods. The line is 
a binomial fit to the values corresponding to the standard algorithm. 

where 6 is the polar angle, 6q is the standard deviation of the multiple scattering angle, r is the radial distance of the 
first pixel layer, x/Xq is the thickness of the layer in radiation length units. For detailed derivation see Fig. [5] Here the 
values r — 4 cm, x/Xq = 3% were used. 

6. Results 

The performance of several discussed vertex finding methods was compared using 10 s inelastic events. Only those 
simulated and reconstructed vertices were taken into account which had at least two tracks. 

The efficiency of vertex finding for single events, as a function of track multiplicity, for several discussed vertex 
finding methods is shown in Fig. [6] In case of the standard algorithm the values were fitted with a binomial distribution, 
giving a true probability p sta „d ar d = 0.87. 

The result as a function of vertex multiplicity in a bunch crossing is shown in Fig. [7] The left column deals with 
simulated vertices by showing the fraction lost, singly reconstructed and split ones, from top to bottom. The fraction of 
lost vertices increases approximately linearly with vertex multiplicity K S j m , the improved methods have about 3 times 
fewer lost vertices. The rate of split vertices is flat. While it is at 1% for the standard method, it drops to 0.1% for the 
proposed ones. The right column takes into account reconstructed vertices by looking at the fraction of fake, correctly 
found and merged ones, from top to bottom. The fraction of fake vertices is again increases approximately linearly 
with reconstructed vertex multiplicity K rec , the improved methods have about 5 times fewer fakes. The rate of merged 
vertices has a linear behavior which is similar for both standard and improved methods. It can be understood since 
with increasing multiplicity the vertices get closer and it is more and more difficult to separate them. It it clear that 
the improved methods have better performance in all examined variables. The fPNN search, the k-means clustering 
and the Gaussian mixture model all provide very similar values. 

The resolution of the longitudinal coordinate of the reconstructed primary vertex as a function of track multiplicity 
is shown in Fig. [8]-left. It is practically independent of the vertexing method used and scales as N~° M . (The small 
difference for N-5 is due to the high efficiency of the proposed methods at very low track multiplicity.) A measure 
of efficiency, the average fraction of lost vertex tracks, as a function of vertex multiplicity is plotted in Fig. |8}right. 
It shows that even in case of a single vertex, the standard method loses about 10% of the tracks, while the improved 
ones keep all of them. The loss increases approximately linearly with K S j„, multiplicity, the improved methods have 
about 4 times fewer lost vertex tracks. 
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Figure 7: Comparison of performance for several discussed vertex finding methods. The left column deals with simulated vertices, by showing 
those which have zero (lost vertex), one (singly reconstructed) and more than one (split vertex) associated reconstructed vertices. The right column 
takes into account reconstructed vertices, by looking at those which have zero (fake vertex), one (correctly found) or more than one (merged vertex) 
associated simulated vertices. The lines are drawn to guide the eye. 
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Figure 8: Left: Resolution of the longitudinal coordinate of the reconstructed primary vertex as a function of track multiplicity in case of single 
events, for several discussed vertex finding methods. The line is a power-law fit to the values corresponding to the Gaussian mixture algorithm, its 
exponent acaussM is indicated in the figure. Right: Average fraction of lost vertex tracks as a function of vertex multiplicity, for several discussed 
vertex finding methods. The lines are drawn to guide the eye. 
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Figure 9: Processing time per bunch crossing for several discussed methods and processes on a 1.6 GHz CPU. Left: single events, as a function of 
track multiplicity. Right: multiple events, as a function of vertex multiplicity. The exponents of the power-law fits for the standard (a st andard) and 
fPNN methods (ff/rajv) are indicated in the figure. 
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Figure 10: Sensitivity of the Gaussian mixture method to background tracks. Comparison of the baseline, underestimated or overestimated cr,, and 
uncertain o% scenarios as a function of vertex multiplicity, based on the merit value X 2 , for two «,„,„ settings (2 or 3). 



6.1. Timing 

For online applications and fast event reconstruction it is important to control the timing of vertex finding. Pro- 
cessing times per bunch crossing for several discussed methods and processes are shown in Fig. [9] Measured values 
on a 1.6 GHz CPU are given for the standard method, the fPNN clustering, as well as for the k-means and Gaussian 
mixture models. In case of the improved sequence, both for single events and pile-up, the largest contribution to tim- 
ing comes from the clustering phase. For single events (Fig.|9}left), the required time per interaction has a power-law 
dependence on the track multiplicity, the exponents being similar for the standard (a stan dard = 2.2) and the improved 
method (a/pm = 2.4). The processing times are essentially the same. In case of multiple events per bunch crossing 
(Fig. (9}right), pile-up, the power-law scaling is steeper for the improved method (afpNN = 2.6) with respect to the 
standard one {a sta ndard = 1 -9). This amounts to a 5 times slower processing in case of 10 simulated vertices, where the 
total time per bunch crossing is 4 ms, but that is still acceptable. 

6.2. Sensitivity 

The stability of the results was also tested, since the performance of the proposed method can be sensitive to 

• the amount of background tracks compatible with the beam-line (see Sec. [5). In the test their number was set to 
2% of the primary multiplicity. 

• systematic shift in the estimated standard deviation of the z value of tracks. In the test cr, was uniformly 
increased and decreased by 10%. 

• random shifts in the estimated standard deviation of the z value of tracks. In the test cr z was track-by-track 
varied according to a normal distribution with 10% standard deviation. 

The comparisons based on the merit value X 2 are shown in Fig. [TOl-left. In case of the addition of background 
tracks the slightly worse performance originates from the increase of fake vertex rate. The underestimation of cr z leads 
to increased fake and split vertices for lower vertex multiplicity. The overestimation of cr, gives higher rate of lost and 
merged vertices for higher vertex multiplicity. The effects can be compensated by increasing the parameter «„„„ to 3 
(Fig.fTUl-right). Random shifts in cr. have practically no effect on the performance. 
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7. Conclusions 



It was shown that finding interaction vertices for collider detectors can be improved by using advanced cluster- 
ing and classification methods, such as agglomerative clustering, followed by Gaussian mixture model or k-means 
clustering. The improvement is present already for single events but it is most pronounced for pile-up. The better 
performance for minimum bias proton-proton collisions means less lost vertices (one third), very few split, and less 
fake vertices (one fifth). The number of lost vertex tracks is decreased as well (one fourth). The scaling of the timing 
and the sensitivity of the proposed method were studied. 

Acknowledgements 

The author wishes to thank to Wolfgang Adam, Krisztian Krajczar for helpful discussions. This work was sup- 
ported by the Hungarian Scientific Research Fund and the National Office for Research and Technology (K 48898, 
H07-B 74296). 

References 

[1] A. Badala, et al., Vertex finding in ALICE by the use of silicon pixel layers in the inner tracking system, Nucl. Instrum. Meth. A485 (2002) 

100-104. doi: 10. 1016/S0168-9002 (02) 00538-7 

[2] M. J. Costa, Vertex and track reconstruction in ATLAS, Nucl. Instrum. Meth. A582 (2007) 785- 789. |doi : 10 ■ 1016/ j . nima ■ 2007 . 10 . 027 1 
[3] W. Adam, Track and vertex reconstruction in CMS, Nucl. Instrum. Meth. A582 (2007) 781-784. doi : 10 . 1016/j .nima. 2007. 07. 091 
[4] W. H. Press, B. P. Flannery, S. A. Teukolsky, W. T. Vetterling, Numerical Recipes: The Art of Scientific Computing; 3rd ed., Cambridge Univ. 
Press, Cambridge, 2007. 

[5] P. Franti, T. Kaukoranta, D.-F. Shen, K.-S. Chang, Fast and memory efficient implementation of the exact PNN, Image Processing, IEEE 

Transactions on 9 (5) (2000) 773-777. |doi : 10 ■ 1 109/83 ■ 8415161 

[6] T. Sjostrand, S. Mrenna, P. Skands, PYTHIA 6.4 Physics and Manual, JHEP 05 (2006) 026. arXiv:hep-ph/0603175 



12 



